Goto

Collaborating Authors

 observable linear dynamical system


Logarithmic Regret Bound in Partially Observable Linear Dynamical Systems

Neural Information Processing Systems

We study the problem of system identification and adaptive control in partially observable linear dynamical systems. Adaptive and closed-loop system identification is a challenging problem due to correlations introduced in data collection. In this paper, we present the first model estimation method with finite-time guarantees in both open and closed-loop system identification. Deploying this estimation method, we propose adaptive control online learning (AdapOn), an efficient reinforcement learning algorithm that adaptively learns the system dynamics and continuously updates its controller through online learning steps. AdapOn estimates the model dynamics by occasionally solving a linear regression problem through interactions with the environment. Using policy re-parameterization and the estimated model, AdapOn constructs counterfactual loss functions to be used for updating the controller through online gradient descent. Over time, AdapOn improves its model estimates and obtains more accurate gradient updates to improve the controller. We show that AdapOn achieves a regret upper bound of $\text{polylog}\left(T\right)$, after $T$ time steps of agent-environment interaction. To the best of our knowledge, AdapOn is the first algorithm that achieves $\text{polylog}\left(T\right)$ regret in adaptive control of \textit{unknown} partially observable linear dynamical systems which includes linear quadratic Gaussian (LQG) control.



propose the first finite-time system identification algorithm for partially observable linear dynamical systems (LDS)

Neural Information Processing Systems

We thank the reviewers for their effort and insightful comments during these unprecedented times. LQR & LQG are among few continuous settings where the optimal policies exist (and mainly have closed form) [1]. Therefore, we do not see why this paper would be less relevant to our community. If PE is absent, we provide two general algorithms stated in Cor. The agent uses a warm-up period of O ( T) after which it commits to a controller yielding a regret of T .


Review for NeurIPS paper: Logarithmic Regret Bound in Partially Observable Linear Dynamical Systems

Neural Information Processing Systems

To clarify, while PE is a common assumption in classical control literature, it is not common in more recent nonasymptotic work.. If one were to assume PE in the state feedback setting, then injecting noise would not be necessary and better regret could be achieved -- but lower bounds tell us that this is not the case. So justifying the applicability of the assumption in this output feedback setting is crucial, and I'm happy to hear that it ends up being a mild assumption satisfied by well known controllers.


Review for NeurIPS paper: Logarithmic Regret Bound in Partially Observable Linear Dynamical Systems

Neural Information Processing Systems

In discussion the reviewers felt that the main result of the paper---that logarithmic regret is possible for LQG under sufficient observation noise---is significant and worth pointing out, especially given \sqrt{T} lower bounds for the fully observable setting. The reviewers did feel that the framing of the results can be improved, and I encourage the authors to do this for the final version. In particular 1) the result is not necessarily surprising given the noise assumptions, and it would be good to be more transparent about this, and 2) the claim (which is even present in the rebuttal) that the exploration scheme here is "strategic" in some way compared to prior results based on injecting random noise is very questionable, and it is indeed not clear that the techniques here can be extended beyond linear control.


Logarithmic Regret Bound in Partially Observable Linear Dynamical Systems

Neural Information Processing Systems

We study the problem of system identification and adaptive control in partially observable linear dynamical systems. Adaptive and closed-loop system identification is a challenging problem due to correlations introduced in data collection. In this paper, we present the first model estimation method with finite-time guarantees in both open and closed-loop system identification. Deploying this estimation method, we propose adaptive control online learning (AdapOn), an efficient reinforcement learning algorithm that adaptively learns the system dynamics and continuously updates its controller through online learning steps. AdapOn estimates the model dynamics by occasionally solving a linear regression problem through interactions with the environment.